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IMPORTANCE  SAMPLING  II?  MONTE  CARLO  ANALYSES 


ABSTRACT 

Importance  campling  is  described  as  used  in  Monte  Carlo  analyses. 

An  Intuitive  justification  of  the  procedure  is  developed  through  a 
non-mathematical  consideration  of  the  fundamental  random  processes  in¬ 
volved.  The  sampling  procedure  and  Its  efficiency  are  illustrated  by 
numerical  examples. 

1.  Introduction.  The  author  has  consulted  vith  operations  ana¬ 
lysts  concerning  the  statistical  problems  of  Monte  Carlo  sampling.  In¬ 
evitably  importance  sampling  is  suggested,  and  this  procedure  disturbs 
the  analyst.  The  difficulty  is  not  simply  that  importance  sampling  Is 
not  understood,  but  that  superficially  it  appears  absurd.  For  example, 
if  a  Monte  Carlo  analysis  is  to  evaluate  the  effectiveness  of  a  weapon 
one  of  whose  parameters  is  a  reliability  coefficient  known  to  be  between 
.30  and  .75,  the  analyst  might  be  told  to  carry  out  the  simulation  using 
.25  for  the  reliability  coefficient.  Such  a  proposal  can  be  pussling, 
and  can  generate  resistance  that  is  not  easily  overcome. 

The  limited  understanding  of  importance  sampling  is  unfortunate. 

The  technique  is  easy  to  employ,  at  least  in  its  simplest  form.  It  can 
be  highly  efficient.  Vhen  understood,  it  is  a  simple,  natural  procedure 
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vhich  docs  not  require  professional  ability  in  statistics.  The  follow¬ 
ing  exposition  was  written  for  the  author's  clients.  The  discussion 
is  intended  to  be  an  elementary  presentation  of  fundamental  statistical 
ideas  which  should  be  familiar  to  an  operations  analyst  interested  in 
Monte  Carlo. 

This  paper  is  an  expository,  largely  non-technical  discussion  of 
statistical  sampling  problems  that  arise  in  Monte  Carlo  analyses.  Ex¬ 
cept  for  the  appendices,  no  statistical  knowledge  is  presumed  beyond 
recognition  of  the  nature  of  a  probability  distribution.  Techniques 
are  not  elaborated.  In  relatively  simple  Monte  Carlo  analyses  the  pro¬ 
cedures  discussed  can  be  employed  adequately  by  the  non-mathematician. 

In  the  case  of  an  elaborate  Monte  Carlo,  the  ideas  qf  thlG  paper  should 
form  the  basin  for  coordination  between  the  operations  analysts  and  the 
mathematical  statistician. 

The  paper  starts  with  an  informal  statement  of  what  is  meant  by 
a  Monte  Carlo  analysis.  There  follows  a  digression  on  stratified 
sampling;  this  digression  will  bring  to  light  some  important  elements  in 
the  Monte  Carlo  analysis.  Finally  the  discussion  of  importance  sampling 
in  Monte  Carlo  statistical  analysis  is  presented  through  simple  numerical 
illustrations.  Mathematical  derivations  are  placed  in  appendices. 

2.  Monte  Carlo.  This  section  indicates  what  is  meant  by  a  Monte 
Carlo  analysis.  The  discussion  introduces  a  simple  example  which  will 
be  used  later  as  a  numerical  illustration. 

Suppose  that  a  machine  starts  at  time  zero  and  runs  until  the  time 
of  failure  x.  The  time  x  is  random  with  probability  density  X  exp(-  Xx), 
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0  x  <  *.  At  the  time  of  failure  the  machine  must  be  scrapped  with 
probability  p,  0  <  p  <  1,  but  with  probability  q  ■  1  -  p  the  machine 
is  repaired.  If  repaired,  the  machine  runs  from  time  x  to  x  +  x' 
with  x*  distributed  as  x.  Again  the  machine  survives  with  probability 
q,  and  in  case  of  survival  the  third  failure  occurs  at  time  x  +  x'  +  x" 
with  x"  distributed  as  x.  The  process  terminates  when  the  machine  is 
scrapped. 

Suppose  that  ve  wish  to  know  the  probability  that  the  machine  will 
survive  until  time  X  (there  may  be  failures  before  time  X,  but  each  of 
these  failures  is  repaired).  This  probability  can  be  computed  analyti¬ 
cally,  and  thic  computation  appears  below  in  Appendix  B.  Alternatively 
one  could  use  the  following  analysis.  One  would  draw  a  random  number 
from  the  exponential  distribution  with  probability  density  X  exp(-  Xx), 
and  this  number  would  simulate  the  time  to  the  first  failure.  Another 
random  number  (uniformly  distributed)  would  determine  whether  the  machine 
could  be  repaired.  Zf  a  repair  is  effected,  a  second  generation  from  the 
exponential  distribution  would  determine  the  time  between  the  first  and 
second  failures.  It  is  obvious  how  the  simulation  would  continue  until 
the  machine  would  be  scrapped.  If  such  a  process  were  carried  out  several 
times,  the  fraction  of  times  that  the  machine  survived  to  time  X  would  be 
used  as  an  estimate  of  the  desired  probability. 

If  one's  interest  were  in  this  problem  per  se,  the  analytic  solu¬ 
tion  is  much  to  be  preferred  to  the  statistical  sampling  procedure.  How¬ 
ever  later  in  the  paper  we  shall  consider  a  Monte  Carlo  analysis  of  this 
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problem.  The  fact  that  the  problem  can  be  handled  analytically  will 
permit  evaluations  of  the  Monte  Carlo  analysis  that  would  be  impossi¬ 
ble  in  case  of  a  problem  appropriate  for  Monte  Carlo  analysis;  typically 
a  Monte  Carlo  analysis  is  used  only  when  an  analytic  solution  is  not 
obtainable.  In  this  paper,  somewhat  incorrectly,  an  "analytic"  pro¬ 
cedure  is  one  that  does  not  involve  statistical  sampling. 

The  statistical  sampling  procedure  as  described  above  is  based  upon 
a  model  whose  random  elements  are  given  analytically,  lilts  distinguishes 
the  problem  from  a  typical  survey  statistics  problem.  If  one  were  to 
estimate  the  tobacco  consumption  per  capita  from  a  sample,  one  might  con¬ 
sider  the  consumption  of  an  individual  to  be  a  random  variable.  But  in 
that  case,  the  distribution  of  the  random  variable  is  unknown.  One  could 
not  replace  a  survey  of  people  by  some  desk  procedure  of  simulating  people 
and  designating  their  consumptions  by  numbers  read  from  a  table.  However, 
regardless  of  whether  the  sample  data  are  obtained  from  a  deak  simulation 
or  a  field  survey,  the  subsequent  mathematical  analysis  of  the  sample 
data  could  be  the  same. 

Some  writers  would  distinguish  between  the  machine-failure  and 
tobacco-consumption  problems  by  saying  that  the  first  can  be  solved  by  a 
Monte  Carlo  analysis,  the  term  Monte  Carlo  indicating  that  one  knows  ex¬ 
plicitly  the  distributions  of  all  the  random  elements  in  the  problem.  In 
this  sense  the  term  Monte  Carlo  signifies  that  one  could  simulate  the 
random  process  by  a  desk  calculation  vhlch  uses  tables  of  random  numbers 
or  by  a  computer  program  which  generates  random  numbers.  With  this 
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definltion  Monte  Carlo  does  not  require  any  distinctive  mathematical 
analysis.  The  techniques  of  analysis  were  in  use  before  the  term  Monte 
Carlo  was  employed.  The  problems  to  be  considered  in  this  paper  are  Monte 
Carlo  problems  in  the  sense  of  the  above  definition.  The  objective  of  the 
paper  could  be  stated  as  that  of  efficiency  in  Monte  Carlo  analyses.  We 
shall  retain  this  definition  at  present*  but  an  alternative  definition  will 
appear  belov. 

Monte  Carlo  analysis*  as  so  defined*  is  almost  a  general*  effective 
procedure  which  enables  one  to  solve  many  problems  too  complex  for  mathe¬ 
matical  analysis.  But  there  is  one  unfortunate  fact.  Such  Monte  Carlo 
analysis  is  costly.  In  one  problem  it  required  a  high-speed  computer 
to  run  1^-  hours  to  obtain  a  single  sample  value.  At  least  20  runs  were 
required  for  even  a  small  sample,  and  results  were  desired  for  hundreds 
of  sets  of  model  parameters. 

There  are  ways  to  reduce  the  cost  of  such  Monte  Carlo  analyses. 
Computer  capabilities  can  be  increased,  and  Judicious  adaptation  of  models 
can  reduce  costa.  But  a  much  easier  way  to  reduce  costs  is  through  the 
employment  of  efficient  sampling  techniques.  The  nature  and  efficacy  of 
Importance  sampling,  one  of  these  techniques*  is  the  subject  of  this  paper. 

In  importance  sampling  one  considers  a  statistical  sampling  problem 
of  the  type  designated  above  as  a  Monte  Carlo  problem.  However*  one  does 
not  carry  out  the  sampling  in  the  manner  suggested  by  the  problem.  Rather 
a  new  random  process  is  introduced  in  place  of  the  original.  The  nature 
of  this  substitution  will  come  to  light  in  later  sections  of  the  paper.  At 
present  we  merely  remark  that  some  writers  reserve  the  term  Monte  Carlo  for 
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a  method  of  analysis  In  which  one  creates  a  random  variable  whose 
expected  value  is  the  solution  of  a  given  problem.  This  random  vari¬ 
able  is  artificial  with  respect  to  the  given  problem.  In  the  machine- 
failure  problem  one  is  concerned  with  the  random  variable  which  is  1  if 
a  machine  is  scrapped  prior  to  time  X,  aai  which  is  0  if  the  machine 
survives  until  time  X  (the  expected  value  of  this  random  variable  is  the 
probability  that  a  machine  is  scrapped  prior  to  time  X.)  This  random 
variable  is  given  in  the  statement  of  the  problem,  and  it  is  not  created 
by  the  mathematician  during  the  course  of  the  analysis.  However,  in  the 
solution  constructed  below  by  a  Monte  Carlo  analysis,  this  random  vari¬ 
able  is  not  used.  Rather  the  mathematician  creates  another  random 
variable,  whose  expected  value  is  the  same  as  that  of  the  given  random 
variable,  but  whose  expected  value  is  cheaper  to  obtain  by  statistical 
sampling. 

3.  Stratified  Sampling.  The  problem  of  section  2  could  be  analysed 
by  simulating  the  histories  of  many  machines  and  computing  statistics  of 
the  outcomes.  The  statistician  would  say  that  data  were  obtained  by 
simple  random  sampling.  But  in  costly  statistical  analyses  It  Is  usually 
possible  to  replace  simple  random  sampling  by  some  more  efficient  pro¬ 
cedure.  Before  describing  such  a  procedure  for  use  in  Monte  Carlo 
analyses,  we  shall  examine  some  features  of  stratified  sampling,  3tois 
digression  will  illustrate  in  simple  form  the  basic  idea  to  be  employed 
in  Importance  sampling. 

As  a  hypothetical  illustrative  example  we  suppose  that  a  hotel  wishes 
to  estimate  the  mean  annual  expenditures  by  its  guests  in  barber  shops  and 
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beuuty  parlors.  It  Is  known  that  the  expenditures  by  women  differ  more 
widely  than  expenditures  by  men.  Many  men  get  a  $2  haircut  every  2 
weeks  at  an  annual  cost  of  roughly  $50;  expenditures  of  as  much  as  $100 
or  as  little  as  $25  are  found  occasionally.  Expenditures  by  women  can 
vary  from  nothing  to  over  $500.  The  mean  expenditure  for  women  is  harder 
to  estimate  than  the  mean  for  men. 

We  assume  that  8o$  of  the  hotel  guests  are  men.  Suppose  that  a 
sample  of  size  15  is  to  be  taken  (we  use  an  absurdly  Bmn.ll  sample  size 
to  simplify  the  exposition. )  If  simple  random  sampling  were  employed, 
we  would  expect  the  sample  to  consist  of  12  men  (8c#  of  15)  and  3  women. 
However,  one  might  decide  to  sample  5  men  and  10  women.  Suppose  the 
expenditures  of  the  members  of  such  a  sample  turned  out  to  be  in  dollars: 

Men:  50,  50,  50,  50,  100 

Women:  0,  50,  100,  100,  200,  200,  200,  300,  500,  800. 

It  is  intuitively  clear  that  such  data  will  lead  to  a  more  accurate  esti- 
mate  of  the  over-all  average  than  would  the  expenditures  of  12  men  and  3 
women. 

To  analyse  these  data  we  calculate  M  and  0,  the  means  of  the  5  male 
expenditures  and  the  10  female  expenditures,  respectively.  The  results 
are 
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These  means  emphasize  the  fact  that  stratified  sampling  is  more 

advantageous  than  simple  random  sampling  in  the  present  situation.  We 

can  observe  that  ^Oj£>  of  the  women  have  expenditures  greater  than  the 

mean  W.  This  reflects  the  fact  that  a  minority  of  the  women  have  an 

important  influence  on  the  neon  W  and  on  the  mean  when  both  sexes  are 

pooled  into  a  single  distribution.  It  is  likely  that  the  estimates  to 

be  made  would  be  more  accurate  if  on  even  greater  fraction  of  the  sample 

consisted  of  women,  however  the  optimum  fraction  is  not  relevant  to  the 

* 

following  discucsion  . 

We  return  to  the  problem  of  estimating  the  mean  expenditure  for  all 
persons,  male  and  female.  If  a  simple  random  sample  of  size  15  had  been 
drawn,  one  would  divide  the  sum  of  the  15  data  by  15.  But  this  can  not 
be  done  in  the  present  instance  because  we  have  distorted  the  natural, 
simple  random  cr.  ylinc  procedure.  However  the  analysis  in  the  face  of 
this  distortion  is  obvious.  Since  80g  of  the  guests  are  men,  we  compute 
the  following  weighted  mean  of  M  and  W,  nnd  we  obtain  on  estimated  mean 
expenditure  of  all  hotel  guests  to  be  . 


* 

The  data  suggest  Hint  the  sample  of  women  should  be  roughly  2.7  times 
as  large  ns  the  sample  of  men.  This  ratio  is  obtained  from  (.2) (48.20) 
/(.8)(17  .89)  in  which  .2  and  .3  are  the  fractions  of  women  and  men 
respectively  in  the  population,  nnd  43.20  and  17. ^9  are  empirical  esti¬ 
mates  of  the  standard  deviations  of  the  expenditures  for  women  and  men, 
respectively.  A  justification  of  this  result  is  beyond  the  scope  of 
this  paper.  Discussions  of  this  analysis  appear  in  [l],  [2],  [}]>  nnd 
other  discussions  of  stratified  sampling. 
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X  -  .8m  +  .2W 

-  (,8)(60)  +  (.2)(245)  -  97. 

We  could  estimate  the  sampling  error  in  this  estimate.  We  shall  not 
do  so  because  the  error  analysis  is  not  needed  for  our  purposes. 

We  turn  next  to  a  cruder  and  jrore  cumbersome  analysis  of  the  data 
given  above.  This  alternative  analysis  is  less  appealing  in  the  barber- 
beauty  shop  problem.  However,  interesting  analogies  with  Monte  Carlo 
analysis  will  appear. 

Let  us  suppose  that  the  sample  was  taken  among  the  hotel  guests 
registered  at  a  specific  time  (we  ignore  the  fact  that  the  statistical 
properties  of  those  guests  may  not  accurately  reflect  the  statistical 
properties  of  all  guests  over  a  period  of  time.)  Let  us  suppose  that  when 
the  6 ample  was  drawn,  there  were  80  men  and  20  women  registered  at  the 
hotel.  If  simple  random  sampling  had  been  employed,  15  of  the  100  guests, 
without  consideration  of  sex,  would  have  been  selected  in  such  a  manner 
that  each  guest  had  the  probability  .15  of  being  included  in  the  sample. 
Thus  a  random  process  is  visualized  which  would  select  15  guests.  Before 
the  process  would  be  implemented,  the  particular  15  selected  would  be  un¬ 
certain,  but  each  of  the  100  guests  would  have  the  probability  .15  of  being 
selected  in  the  sample. 

Ibis  simple  random  sampling  process  was  not  employed.  Rather  the 
natural  process  was  distorted.  Whereas  any  man  would  have  the  proba¬ 
bility 
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P(M1)  -  .15 

of  being  included  in  a  simple  random  sample,  the  probability  vas  dis¬ 
torted  to 


P*(M1)  -  5/90  -  .0625 

under  the  distorted  sampling  procedure  which  selected  5  of  the  80  nale 
guests.  For  any  Individual  woman  the  probability  of  being  Included 
in  a  simple  random  sample  is 


p(W1)  -  .15, 


and  the  probability  of  being  included  in  a  sample  drawn  by  the  distorted 
process  is 

p*(Wi)  -  10/20  -  .5  . 


Consider  a  particular  man  who  was  selected  in  the  sample  that  was 
drawn,  lb  be  specific  suppose  that  this  man  le  the  one  with  expenditure 
100.  We  shall  designate  him  by  M10Q.  For  analytic  purposes  to  be  re¬ 
vealed  below,  we  compute  for  this  man  the  weight 


.15 

73S25 


2.4  . 


The  interpretation  of  this  weight  is  that  M1qq  would  expect  to  appear  in 
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simple  random  cunpleo  (if  a  larce  number  of  samples  would  be  drawn)  2.4 
times  as  often  as  in  sumples  drawn  under  .the  distorted  process.  The 
distorted  process  underestimates  the  importance  of  Miqq  by  the  factor 
of  2.4.  Suppose  that  the  hotel  guests  numbered  thousands,  instead  of 
100,  and  that  there  were  many  duplicates  of  M  .  The  distorted  sampling 

process  would  include  several  duplicates  of  M  ,  but  in  simple  random 

sampling  one  would  expect  2.4  times  as  many  of  such  duplicates.  Hence  in 
the  analysis,  which  will  be  carried  out  with  use  of  formulas  designed  for 
simple  random  sampling,  we  will  count  M1qq  as  2.4  individuals. 

Similarly,  consider  one  of  the  women  drawn  into  the  sample,  say 

W  .  For  her  we  have  the  weight 
eoo 


v(W  ) 

'  ooo' 


p(W„  ) 

0Oo' 

*  /  V 

P  (u  ) 
r  ooo 


.15 


If  many  simple  random  samples  would  be  drawn,  this  lady  would  be  drawn 
into  the  sample  approximately  30$  a3  often  as  she  could  expect  to  be 
chosen  under  the  distorted  process.  Hence  the  distorted  process  over¬ 
estimates  the  importance  of  the  lady  by  a  factor  of  i/.3  =  5«55.  In  the 
analysis  we  should  downgrade  the  lady's  importance  by  counting  her  as  .3 
of  a  person. 

We  return  to  the  numerical  sample.  For  each  person  actually  drawn 
into  the  sample  we  compute  the  weight.  For  each  man  the  weight  is  2.4 
and  for  each  woman  the  weicht  is  .3*  We  compute  the  arithmetic  mean  of 
the  15  numbers  in  the  sample,  but  we  count  each  man  an  2.4  men  and  euch 
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woman  as  .3  women.  The  result  is  a  new  estimate  of  x,  called  x',  com¬ 
puted  as 

+«••+  a-Mioo)^  (Q)(o)  +*■.+  (QJtapQ)  .  ^ 

Fortunately  x'  *  x.  It  is  possible  to  prove  that  this  equality  is  to 
be  anticipated.  Such  a  proof  is  not  presented  in  this  paper,  except 
for  a  special  cose  found  in  Appendix  B.  Proofs  are  given  in  references 
[4]  and  [5]. 

The  statistic  x  is  simpler  to  comprehend  than  x'.  However  the 
second  statistic,  or  rather  the  basic  ideas  involved  in  the  definition 
of  x',  can  be  employed  in  a  wide  variety  of  situations.  In  fact  we  can 
state  the  following  general  rule.  As  an  estimator  of  a  population  ex¬ 
pected  value  we  could  use  a  sample  mean  calculated  from  the  elements  of 
a  simple  random  sample.  Suppose,  however,  that  instead  of  simple  random 
sampling  we  use  a  sampling  procedure  in  which  the  population  elements 
have  probabilities  (or  likelihoods)  of  inclusion  within  the  sajiq>le  which 
are  different  from  the  probabilities  under  Bimple  random  sampling.  For 
each  element  x  of  the  population  from  which  the  sample  is  drawn,  let  p(x) 
and  p*(x)  be  the  probabilities  that  the  element  x  would  be  drawn  Into  the 
sample  under  simple  random  sampling  and  the  alternative  sampling  process, 
respectively.  Consider  the  weight  w(x)  ■  p(x)/p  (x).  We  can  still  use 
the  sample  mean  as  an  estimator  of  the  population  expected  value  if  we 
weight  each  sample  value  by  w(x).  This  rule  will  be  illustrated  and 


clarified  below, 
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4.  Importance  Sampling  in  Monte  Carlo  Analysis.  We  are  ready 
to  discuss  importance  sampling.  The  discussion  continues  through  the 
medium  of  trivial  numerical  illustrations.. 

Consider  the  exponential  distribution  with  probability  density 

(1)  p(x)  «  .01  exp(-  .Olx),  0  £  x  <  »  . 

We  shall  estimate  the  probability  that  a  saznple  value  from  this  distri¬ 
bution  is  less  than  1.  This  probability  is  easy  to  obtain  analytically# 
being  1  -  exp(-  .01)  =  .00995  to  five  decimal  places.  However  we  sho.ll 
attack  the  problem  by  a  Monte  Carlo  analysis  in  order  to  obtain  a  simple 
illustration  involving  sampling  with  distorted  probability  distributions. 

Suppose  ve  were  to  generate  a  simple  random  sample  from  the  distri¬ 
bution  (1).  It  would  require  a  large  sample  to  give  an  accurate  estimate 
of  the  probability  that  a  sample  value  of  (1)  is  less  than  1.  Hiis  is 
due  to  the  fact  that  approximately  1$  of  the  sample  values  would  be  less 
than  1.  Hence  hundreds  of  sample  values  would  be  required  before  we  would 
know  that  the  fraction  is  near  .01. 

In  order  to  obtain  a  greater  proportion  of  sample  values  within  the 
interval  of  importance,  namely  (0#l)  we  shall  distort  the  sampling  pro¬ 
cedure.  We  introduce  the  distribution  with  probability  density 

(2)  p*(x)  =  cxp(-  x). 


If  we  sample  from  this  distribution#  which  superficially  has  no  relevance 
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to  the  problem,  we  shall  achieve  the  result  that  a  large  fraction  of 
the  cample  values  will  fall  within  the  importance  interval  (0,1);  the 
expected  fraction  is  1  -  e"1  *>  .63.  Setting  aside  momentarily  any 
question  of  the  sanity  of  our  operation,  let  ub  consider  a  sample  from 
the  distribution  p  (xj.  Suppose  that  the  first  number  generated  from 

ft 

p  (x)  were  2.  Let  us  consider  the  likelihoods  of  generating  this  value 
2  in  both  undistorted  and  distorted  sampling.  The  likelihood  in  case  of 
undistorted  sampling  is  obtained  from  (l)  as  p(2)  ■  .01  exp(-  .02) 

*  .0098020,  and  the  likelihood  of  drawing  thiB  some  value  2  in  distorted 

ft 

sampling  is  obtained  from  (2)  as  p  (2)  »  .1355^.  The  ratio  of  these 
likelihoods  is 


jM. 

P*(2) 


.0098020 

.139^ 


.07 


approximately.  This  implies  that  in  undistorted  sampling  one  can  expect 
approx! '  U*ly  7/->  ns  many  cample  values  in  the  interval  (2,  2  +  dx)  as 

would  be  obtained  under  distorted  campling.  But  this  means  that  one  can 

* 

sample  from  p  (x),  count  the  number  of  sample  values  between  2  and 
2  +  dx,  and  multiply  by  .07;  in  this  way  one  has  an  unbiased  estimate  of 
the  number  of  sample  values  expected  between  2  and  2  +  dx  under  undlstorted 
sampling  (and  with  the  some  sample  size.)  In  practice,  if  2  were  generated 
under  dictorted  sampling,  one  would  accept  2  not  as  one  value  but  as  .07 
of  a  value. 

Hie  numbers  computed  above  for  x  -  2  appear  in  Table  1.  This  table 
also  contains  similar  results  for  other  values  of  x.  For  example.  Table  1 
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gives  the  weight  1,412  for  the  sample  value  x  -  5*  This  implies  that 

a  sample  value  within  the  interval  (5,  5  +  dx)  con  he  expected  Ul.25t 

more  often  with  undistorted  sampling  than  with  distorted  sampling. 

Several  such  weights  are  listed  in  Table  1.  The  veights  reflect  the 

obvious  fact  that  small  sample  values  are  more  likely  to  be  generated 
# 

from  p  (x)  but  large  values  are  more  likely  from  p(x). 

To  illustrate  the  use  of  weighted  sampling  we  have  drawn  a  random 
sample  of  site  10  from  p  (x).  The  sample  values  of  x  are  listed  in  Table 
2.  In  addition  Table  2  gives  each  of  the  veights.  Since  ve  are  estimating 
the  probability  that  x  is  less  than  1,  ve  consider  the  six  values  of  x  in 
Table  2  that  are  less  than  1.  The  value  .31  is  counted  as  .014  of  an 
observation,  .17  as  .012  of  an  observation,  etc.  Hie  sum  of  the  weights 
for  the  six  x'e  less  than  1  is  .096.  Hence  ve  count  slightly  less  than 
one-tenth  of  an  x  less  than  1.  Since  the  sample  size  is  10,  ve  estimate 
the  probability  that  x  in  undlstorted  sampling  will  be  less  than  1  to  be 
.096/10  ■  .0096.  This  estimate  1b  close  to  the  true  value  .00995. 

Hie  procedure  has  been  the  following.  If  one  were  to  sample  from 
p(x),  approximately  one  out  of  a  hundred  sample  values  would  be  less  than 
1,  and  it  would  require  a  large  sample  to  produce  adequate  data  for  an 
estimate  of  the  probability  that  x  is  less  than  1.  We  replaced  p(x)  by 

*/  i 

p  (x)  which  generates  a  large  fraction  of  its  sonqple  values  less  than  1. 

We  observed  that  any  sample  value  from  p  (x)  can  be  weighted  in  such  a 
way  as  to  represent  a  number  of  sample  values  from  the  distribution  of  p(x). 
This  number  (weight)  is  in  some  cases  a  small  fraction  and  in  other  cases 
much  greater  than  1.  In  the  numerical  illustration  the  distorted  sampling 
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produced  6  of  10  sample  values  less  than  1.  But  the  weighting  procedure 

led  to  counting  each  of  the  6  as  a  small  fraction  of  a  single  value  when 

the  values  are  to  be  Interpreted  as  from  the  distribution  of  p(x).  The 

mathematical.  Justification  of  the  weighting  procedure  and  the  estimate  of 

* 

the  variance  will  not  be  made  in  this  paper  . 

5.  Common  Distortions  of  Two  or  More  Random  processes.  In  Section 
4  we  estimated  a  parameter  of  the  distribution  p(x)  given  by  (1).  Ve  did 

not  generate  a  sample  from  this  distribution}  instead,  our  sample  was  from 

#  # 

the  distrlbu'  jn  p  (x)  given  by  (2).  Let  us  observe  that  p  (x)  can  be 

regarded  as  a  distortion  of  many  distributions.  Hence  the  sample  of 

# 

Table  2,  drawn  from  p  (x),  can  be  used  for  statistical  analyses  of  many 
distributions. 

To  clarify  this  matter  by  a  numerical  illustration,  ve  consider  the 
distribution  with  probability  density 

p’(x)  -  ,02  exp(-  .02x),  0<lx<«  . 

We  shall  estimate  the  probability  that  x,  randomly  drawn  from  p‘(x),  is 
less  than  or  equal  to  1.  Our  new  problem  is  identical  with  the  problem 
of  Section  4  except  that  p(x)  is  replaced  by  p’(x). 


See  [4]  or  [5J. 
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4} 

We  shall  use  the  same  p  (x)  as  a  distortion  of  p'(x).  We  proceed 
as  in  Section  4  and  obtain  Table  3  in  place  of  Table  2.  The  sum  of 
the  weights  in  Table  3  for  Bample  values  in  the  interval  (0,1)  is  .190. 
Dividing  this  sum  by  the  sample  size  10,  we  obtain  .0190  as  the  esti¬ 
mate  of  the  probability  that  x  from  p'(x)  is  less  than  or  equal  to  1. 

This  estimate  can  be  compared  with  the  true  value  .0198. 

The  salient  feature  is  that  two  problems  have  been  solved  by  use 
of  the  same  sample  (the  first  columns  of  Tables  2  and  3  are  identical). 

In  a  serious  Monte  Carlo  most  of  the  computing  time  is  used  in  obtaining 
the  sample  values  from  the  distorted  distribution;  typically  the  time 
for  statistical  analysis  is  relatively  insignificant.  In  our  trivial 
example  this  does  not  happen  to  be  true.  But  if  we  should  assume  that 
the  major  part  of  the  computation  consisted  in  the  generation  of  the  first 
column  in  Tables  2  and  3,  we  would  conclude  that  we  have  solved  two 
problems  at  the  cost  essentially  of  a  single  analysis. 

In  general,  consider  the  probability  distributions  obtained  by 
assigning  a  set  of  values  to  1  in  4  exp(-  Xx).  Suppose  that  for  each  of 
these  distributions  we  wish  to  know  the  probability  that  x  is  less  than  or 
equal  to  1.  All  these  problems  can  be  solved  from  a  single  sample  drawn 
from  p  (x).  If  the  number  of  values  assigned  to  X  Is  large,  the  savings 
obtained  from  distorted  sampling  can  be  tremendous.  (However,  if  the 
values  of  X  differ  greatly  among  themselves,  it  is  possible  that  a  common 
distortion  of  all  the  distributions  may  not  be  efficient  for  every  X.  It 
might  be  necessary  to  group  the  values  of  X  into  sets,  and  to  handle  the 
sets  separately.  Such  technicalities  are  beyond  the  scope  of  this  paper.) 
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6.  Complex  Stochastic  Processes.  In  the  example  of  Section  4 
the  efficiency  of  the  Monte  Carlo  analysis  can  be  greatly  Increased 
by  distorted  sampling.  (We  say  that  a  first  sampling  procedure  is  k 
times  as  efficient  as  a  second  procedure  if  the  sample  sizes  and  H^, 
respectively,  required  for  a  given  sampling  error  satisfy  H#  ■  kH^.) 
Unfortunately  most  Monte  Carlo  analyses  are  applied  to  more  complex 
stochastic  processes,  and  the  dramatic  savings  of  Section  4  are  much 
harder  to  obtain.  (But  the  procedure  of  Section  ?  is  no  less  efficient.) 
We  shall  illustrate  this  fact  by  an  example  of  a  stochastic  process  with 
two  random  elements. 

Consider  the  random  variable  y  which  is  distributed  uniformly 
between  0  and  200.  The  probability  density  of  y  is 


1/200,  0  £  y  <  200, 
0  otherwise. 


We  also  use  the  random  variable  x  with  probability  density  p(x)  given 
by  (l).  We  assume  x  and  y  Independently  distributed.  We  shall  study 
z  •  x  +  y,  and  we  consider  the  estimation  by  Monte  Carlo  analysis  of  the 
probability  that  z  is  less  than  1.  To  obtain  values  of  z  in  the  important 
Interval  (0,1)  we  shall  distort  the  distributions  of  both  x  and  y.  The 
distortion  of  p(x)  will  be  the  same  used  above.  The  distortion  of  P(y)  will 
be  the  probability  density 


p*(y) 


1/  0  <  y  <  i, 

1/399,  i  <  y  £  200, 
0  otherwise 
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Under  distorted  sampling  for  y,  i.e.,  with  y  generated  from  the 
distribution  with  density  P*,  the  weight  will  be  P(y)/P*(y)  »  .005 
If  o  £  y  <  b  an*  P(y)/P*(y)  -  1.995  if  £  <  y  <>  200. 

Suppose  that  one  should  generate  under  distorted  sampling  x  ■  .4 
and  y  ■  .4,  and  hence  z  *=  .  8.  Since  x  and  y  are  independently  distri¬ 
buted,  the  likelihood  of  this  pair  of  drawings  under  undietorted  sampling 
is  p(x)p(y),  and  the  likelihood  under  distorted  sampling  is  p*(x)P*(y). 
Hence  the  weight  associated  with  the  pair  of  values  is 

p(*)p(y) 

P*(x)P*(y) 

This  is  the  product  of  the  weights  associated  with  x  and  y  indi¬ 
vidually.  For  x  »  .4,  the  weight  is  seen  in  Table  1  to  be  .0149,  and 
for  y  *  .4,  the  weight  is  given  above  as  .005.  Hence  the  weight  associ¬ 
ated  with  z  determined  as  .4  +  .4  is  ( .0149) ( .005)  »  .0000745. 

Suppose  that  another  pair  of  drawings  gave  x  ■  .2  and  y  »  .6,  and 
hence  again  z  =  .8.  One  easily  checks  that  the  weight  associated  with 
the  pair  of  generations  is  ( .0122 ) (1. 995)  -  .024539*  The  important  aspect 
of  these  results  is  that  both  pairs  of  generations  produced  the  same  z, 
namely  z  =  .8,  but  the  weights  are  different,  indeed  greatly  different. 
Thus  we  do  not  have  the  monotonlclty  of  the  weight  as  a  function  of 
distance  vhlch  is  apparent  in  Table  1.  Such  instability  of  the  weights 
can  greatly  reduce  the  efficiency  of  sampling  distortions.  This  fact  is 
proved  in  Appendix  A. 
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Let  us  reflect  on  this  example.  The  generation  of  a  value  of  i 
requires  the  generation  of  an  x  and  a  y.  Under  distorted  campling  the 
weight  associated  with  z  is  the  product  of  the  weights  associated  with 
x  and  y.  Suppose  that  a  small  value  of  x  is  drawn.  Since  such  a  small 
x  is  more  likely  under  distorted  sampling,  v(x)  is  small.  This  tends 
to  moke  the  weight  of  z  small.  This  is  fortunate  because  our  objective 
under  distorted  sampling  is  to  get  a  large  number  of  small  values  of  *; 
furthermore,  the  large  number  of  z'e  must  have  small  weights  associated 
with  them  to  prevent  bias  in  the  statistical  estimates. 

However,  this  advantageous  relation  between  x  and  w(x)  does  not 
necessarily  produce  the  some  relation  between  z  and  w(z).  If  a  small  x 
is  added  to  a  moderately  large  y,  the  sum  1b  a  z  which  is  not  small. 
However,  the  weicht  w(z)  might  be  small,  being  the  product  of  a  very 
small  w(x)  and  a  value  of  w(y)  near  1.  In  other  words,  although  there 
may  be  advantageous  correlations  between  x  and  w(x)  as  well  as  between 
y  and  w(y),  it  is  on  urifortunate  fact  that  the  resulting  correlation 
between  x  +  y  and  v(x)w(y)  may  be  weak.  Thus  we  do  not  have  the  situ¬ 
ation  in  which  ull  small  values  of  z  have  small  weights  and  all  large 
values  of  z  have  large  weights.  The  serious  implications  of  this  non¬ 
monotonic  relation  between  z  and  w(z)  may  not  be  apparent.  However,  it 
is  proved  in  Appendix  A  that  such  instability  of  the  weights  can  greatly 
reduce  the  efficiency  of  the  distorted  sampling. 

Consider  a  complex  Monte  Carlo.  There  will  be  many  random  vari¬ 
ables  x  , . . . , x  ,  with  n  in  some  cusc3  greater  than  1,000.  Instead  of 
i  n 
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the  simple  relation  z  =  x  +  y,  the  outcome  of  the  process  z  is  some 
complex  function  of  x  In  general,  the  greater  n,  the  more 

difficult  it  is  to  achieve  an  effective  correlation  between  this  function 
z  and  the  product  of  the  n  weight  factors. 

In  complex  Monte  Carlo  analyses  one  tries  to  introduce  distorted 
sampling  of  one  or  more  of  the  random  variables  in  the  process.  The 
objective  is  to  obtain  a  relatively  large  amount  of  data  within  intervals 
of  importance.  Furthermore  these  data  should  have  small  weights  to  com¬ 
pensate  for  their  large  quantity.  The  data  which  fall  outside  the  inter¬ 
vals  of  importance  should  be  few  in  number  but  for  that  reason  have  large 
weights.  For  complex  stochastic  processes  it  is  often  difficult  to 

m 

determine  appropriate  distortions. 

7.  A  Less  Unrealistic  Illustration.  Ve  have  described  sons  aspects 
of  the  statistical  sampling  problem  in  Monte  Carlo  analysis.  These  dis¬ 
cussions  will  be  summarized  through  the  medium  of  a  numerical  example 
which  is  intended  to  bridge  the  gap  between  formalism  and  realistic 
application.  The  discussion  is  based  upon  mathematical  results  which  are 
deferred  to  the  appendices. 

We  shall  study  the  process  described  above  in  which  the  running  time 
between  failures  of  a  machine  is  generated  from  the  distribution  with 
probability  density  X  exp(-  Xx),  0  <  x  <  »;  at  each  time  of  failure  the 
machine  dies  (is  scrapped)  with  probability  p  but  is  repaired  with  proba¬ 
bility  q  =  1  -  p;  in  case  of  repair  on  additional  running  time  is  generated 
from  the  same  exponential  distribution;  the  process  continues  until  death  is 
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generated  at  a  time  of  failure.  Let  us  suppose  that  the  basic  problem 
is  to  deteraine  the  1$  quantile  of  the  distribution  of  times  to  death. 

In  other  words  we  wish  a  lower  tolerance  limit  for  this  time  to  death 
so  that  with  99$  confidence  one  can  assume  that  a  machine’s  life  will 
exceed  this  tolerance  limit.  This  1$  quantile,  which  we  shall  denote  by 
X,  can  be  computed  analytically,  and  this  is  done  in  Appendix  B.  For 
this  reason  our  example  is  simpler  than  most  Monte  Carlo  simulations. 

But  the  simplicity  will  permit  analytic  evaluations  which  are  impossible 
if  u  simulation  is  complex. 

The  running  times  and  death  or  survival  at  each  failure  could  be 
simulated.  The  histories  of  several  machines  could  be  generated,  and  the 
tine  of  death  recorded  for  each  history.  From  the  record  of  these  empiri¬ 
cal  times  of  death,  one  could  estimate  the  1 $  quantile  X.  Such  an  esti¬ 
mate  would  have  a  large  relative  error  unless  the  sample  size  were  very 
large.  This  is  due  to  the  fact  that  very  few  of  the  empirical  data  would 
be  within  the  interval  of  importance  (0,X). 

To  obtain  a  greater  fraction  of  the  empirical  results  within  the 

* 

importance  interval  (0,X),  we  can  distort  the  Monte  Carlo  process.  We 
can  replace  the  distribution  of  running  times  with  one  having  a  smaller 
expected  time  between  failures.  Furthermore  we  can  Increase  the  proba¬ 
bility  of  death  at  each  failure.  We  note  that  the  expected  value  of  the 
random  variable  with  probability  density  X  exp(-  Xx)  is  l/X  (indeed,  the 
integral  from  0  to  »  of  Xx  exp(-  Xx)  is  l/X.)  Hence  if  the  ejqponential 
distribution  with  parameter  X  i3  replaced  by  the  exponential  distribution 
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*  * 

with  parameter  X  with  X  <  X  ,  the  expected  time  between  failures  is 

reduced  from  l/X  to  l/X  .  In  addition  we  can  replace  the  probability 
* 

of  death  p  by  p  >  p. 

Thus  one  would  hope  to  employ  importance  sampling  advantageously 

#  #  * 
by  increasing  X  and  p  to  X  and  p  .  But  appropriate  values  for  X  and 

p  are  not  immediately  obvious.  Should  both  or  only  one  of  the  parameters 

be  distorted?  How  great  should  the  distortions  be?  Before  discussing 

how  one  might  resolve  these  questions,  we  shall  indicate  the  optimum 

distortions  in  a  special  case. 

To  particularize  the  discussion  we  shall  use  X  *  1  and  p  =  For 

*  *  * 

several  pairs  of  values  of  X  and  q  =  1  -  p  ve  have  computed  the  effi¬ 
ciency  of  the  distorted  sampling  relative  to  undistorted  sampling.  These 
relative  efficiencies  appear  in  Table  4.  For  example  Table  4  gives  .00538 

*  n  * 

for  the  relative  efficiency  in  case  X  =  oO  and  q  «  .005.  This  means 
thut  if  the  sampling  error  is  preassigned,  and  if  a  sample  of  size  N  is 
required  under  undlstorted  sampling  to  keep  the  error  of  estimate  within 
the  given  limit  of  error,  a  sample  size  of  .00538N  would  be  adequate  to 
attain  the  same  accuracy  if  the  parameters  are  distorted  to  X  =  oO  and 
q  *  .005. 

We  shall  present  a  mathematical  analysis  of  this  numerical  example 
in  Appendix  B.  But  first  we  aonclude  the  non-mathematicol  part  of  the 
exposition  with  some  general  remarks.  In  a  real  problem  one  cannot  con¬ 
struct  Table  4;  if  one  has  enough  information  to  construct  such  a  table, 
it  in  likely  that  one  could  solve  the  problem  analytically.  Hence  one 
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muct  obtain  good  distort lone  of  the  model  parameters  partly  by  guess¬ 
work.  Cuch  guessing  Is  not  a  simple  matter.  Table  4  reveals  that  some 
distortions  would  be  disastrous. 

In  the  face  of  our  illustrative  problem  one  might  reason  as  follows. 
One  cannot  get  a  large  fraction  of  small  times  to  death  x  merely  by 
increasing  p.  This  is  due  to  the  fact  that  the  first  time  to  failure 
is  generated  before  the  probability  p  comes  into  play,  and  the  first  time 
to  death  already  exceeds  X  in  a  large  fraction  of  the  histories.  Hence 
it  is  likely  that  large  sampling  savings  will  require  a  distortion  of  X. 

It  would  appear  dangerous  to  rely  solely  on  a  distortion  of  p. 

Without  further  insight  into  the  times  that  would  be  generated 

under  undistorted  sampling,  one  could  not  do  much  better  than  the  follow- 

« 

ing.  We  note  that  it  could  be  disastrous  to  use  too  large  values  of  X 

*  #  # 

and  p  ;  indeed,  Table  4  indicates  that  X  *>  1  and  q  =  .0005  would  be 

bad,  and  we  remark,  without  proof,  that  similar  unfortunate  resultB  are 

*  i 

obtained  if  X  is  increased  beyond  the  range  in  Table  4.  Hence  one  might 

generate  two  small  samples  of  histories  with  small  distortions  to  say 

*  *  *  * 

X  =  2  and  q  =  .4  in  one  sample  and  X  *  5  and  q  -  .3  in  the  other. 

Estimates  of  the  sampling  errors  in  the  two  samples  could  suggest  trial 
values  of  the  parameters  in  a  third  sample.  This  timid,  tentative,  prob¬ 
ing  procedure  has  serious  disadvantages.  In  the  first  place,  the  sampling 
errors  in  the  estimates  of  the  sampling  errors  would  be  great  with  small 
samples,  and  the  empirical  results  might  mislead'  one  to  believe,  until 
further  data  were  available,  that  a  distortion  of  X  to  2  is  better  than  a 
distortion  to  3.  More  seriously,  the  analysis  would  be  completed  (with 
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smoll  cavings)  before  one  used  parameters  anywhere  near  the  optimum. 

In  some  cases  these  difficulties  cannot  be  circumvented. 

If  one  can  anticipate  the  results  that  would  be  obtained  under  simple 
random  sampling,  one  can  act  more  effectively.  Suppose  one  had  reason  to 
believe  that  in  undistorted  sampling  the  expected  value  of  the  time  to 
death  is  near  2  (it  is  2)  and  that  the  1$  quantile  is  near  .020  (it  is 
.020).  Then  one  would  know  that  in  undistorted  sampling  much  of  the  time- 
to-death  data  would  be  near  2,  whereas  we  would  like  data  near  .020.  This 
suggests  a  distortion  of  the  times  to  death  which  decreases  the  ex- 

* 

pec ted  value  of  the  distances  by  a  factor  of  roughly  100.  Use  of  X.  -  100 

would  effect  such  a  distortion.  Hence  for  the  sake  of  simplicity  one  could 

*  * 

leave  p  undistorted  and  use  X  *  50  or  X  -25  depending  upon  how  timid 
one  is  in  the  face  of  the  fact  that  it  is  worse,  usually,  to  overestimate 
than  to  underestimate  the  optimum  distortion. 

It  is  possible  to  devise  less  elementary  procedures  for  arriving  at 
a  good  estimate  of  an  optimum  distortion  (but  to  the  author's  knowledge, 
current  literature  does  not  indicate  any  simple,  generally  applicable 
procedure  for  estimating  accurately  an  optimum  distortion.)  In  general, 
elementary  considerations  often  cannot  be  sharper  than  the  above  thoughts. 

These  thoughts  lead  to  the  suggestion  that  in  the  hands  of  a  mathe- 
mnt leal  amateur,  importance  sampling  can  produce  significant  but  moderate 
savings  in  computing  time.  If  a  Monte  Carlo  is  so  large  that  high  com¬ 
puting  costs  are  involved,  it  is  likely  that  profit  would  result  from 
professional  mathematical  assistance  in  the  design  of  the  statistical  sampling 
procedures. 


C-2k-60 

-26- 


8.  Estimates  of  Expected  Values.  The  entire  paper  has  been  limit¬ 
ed  l)  the  discussion  of  one  problem,  the  estimation  of  the  probability 
that  the  output  of  a  Monte  Carlo  is  less  than  X,  where  X  Is  the  1$ 
quantile  or  some  other  quantile  with  a  small  percentage.  This  proba¬ 
bility  is  an  expected  value  (of  the  random  variable  which  is  1  if  the 
time  to  death  is  less  than  X,  and  is  0  otherwise.)  In  general,  Monte  Carlo 
analysis  involves  the  estimation  of  expected  values.  However  the  different 
information  requirements  arise  in  the  estimations  of  different  expected 
values.  Hence  technical  procedures  vary  from  problem  to  problem. 

Consider,  for  example,  the  numerical  example  discussed  in  Section  7* 
One  might  wish  to  know  the  expected  value  of  the  distribution  of  times  to 
death.  For  {in  estimate  of  this  expected  value,  the  distortions  used  above 
would  be  bad.  The  large  times  of  death  are  more  important  than  the  small 
ones  in  the  sense  that  an  efficient  sample  should  contain  more  large  times 
than  would  be  generated  in  simple  random  sampling.  One  would  in  this  case 
decrease  X  and  p.  We  shall  not  discuss  this  problem. .  Our  purpose  at  this 
point  is  to  warn  the  reader  not  to  assume  that  all  Monte  Carlo  statistical 
analysis  are  completely  similar  in  details  to  the  illustrations  used  in 
this  paper. 
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APPENDIX  A 

In  Section  6  concerned  with  complex  stochastic  processes,  we  con¬ 
sidered  a  random  process  with  two  random  elements  x  and  y.  Hie  output 
of  the  process  z  =  x  +  y  was  such  that  different  pairs  of  x  and  y  could 
produce  the  some  value  of  z.  In  addition,  under  the  distorted  sampling 
employed,  the  weights  associated  with  the  different  pairs  could  be 
different.  Hence  for  fixed  z  the  weights  vaiy.  It  was  stated  that  this 
variability  of  the  weights  can  greatly  reduce  the  efficiency  of  the  dis¬ 
torted  sampling.  This  result  will  follow  from  the  sampling  error  formula 
derived  in  Appendix  A. 

We  have  a  random  process,  and  we  wish  to  estimate  the  probability 
that  the  outcome  of  the  process  is  within  some  interval  I;  in  the  problems 
considered  in  this  paper  the  interval  I  was  in  the  form  (0,X).  A  sample 
is  drawn  with  distorted  distributions,  and  the  empirical  data  consist  of 
a  sequence  of  "distorted"  outputs  x^,  i  ■=  1, ...,N,  and  the  corresponding 
weights  w^;  the  outputs  are  sample  values  of  a  random  variable.  The 
estimate  of  the  probability  that  the  output  (in  undistorted  sampling)  is 
within  I  is 

n 

0)  t  =  N’1  £w±, 

1 

*  * 

where  we  assume  that  the  have  been  so  arranged  that  x^  is  within  I 
if  and  only  if  1  £  n.  (The  statistic  t  is  simply  the  number  of  sample 
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values  vithin  I  divided  by  the  cample  size,  each  sample  value  being 

counted  v  times.)  We  shall  derive  the  sampling  variance  of  t. 

#  * 

Let  p  be  the  probability  that  an  output  x  will  be  vithin  I  under 

distorted  sampling.  Then  n  is  a  binomially  distributed  integer  with  N 
# 

and  p  as  the  binomial  parameters.  Hence  the  probability  of  n  is 

00  P(n)  ■  (H  ) 

The  expected  value  of  t  is  the  sum  over  n  of  the  expected  value  given  n 
multiplied  by  the  probability  of  n;  in  symbols 

N 

(5)  E(t)  =^E(t|n)P(n). 

0 

Let  E(v:l|x*€l),  i  =  1,2,  be  the  conditional  expected  value  of  v1  under 

*  ♦ 

the  condition  that  the  corresponding  distorted  output  x  (i.e.,  the  x 

which  has  v  as  its  weight)  is  in  I.  Since  each  which  appears  in  (3) 
* 

corr-  -  ,,onde  to  an  x  within  I,  we  have 

E(t|n)  =  N"inE(w|x%l). 

This,  (4)  and  (5)  give 

E(t)  *  N"1E(w| x*el)  YlI(^')  P#Dq#N  °' 

0 

Since  the  sum  appearing  in  this  expression  is  the  expected  value  of  n, 
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* 

and  since  this  expected  value  is  Hp  from  the  veil-known  theory  of  the 
binomial  distribution,  ve  obtain 


(6) 


E(t)  =  p*E(vi|x*el). 


In  the  calculation  of  E(t2)  we  shall  encounter 


*n  *N-n 


P  q 


which  is  the  expected  value  of  the  square  of  the  binomlally  distributed 
variable  n,  which  is  the  variance  plus  the  square  of  the  expected  value 
of  n,  which  i6  well-known  to  be 


(7) 


*  « 

Np  q  + 


(Np  )S 


Using  this  result  we  calculate  that 


■  t  n'M 

n*0  i=l 


#n  *w-n 


p  q 


=  N 


-2 


+  n(n  -  l)[E(v|x 


(S ) 


*n  *N-n 
P  q 
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{ because  of  the  independence  of  and  Vy  i  J,  we  have  ®(w^wj) 

»  [ E(w) ] 2  for  all  n(n  -  l)/2  pairs.)  We  replace  n(n  -  l)  by  n2  -  n 
and  use  (7)  to  obtain 

E(t2)  =  N'2(E(v2lx*eI)Np#  +  (E(v|x#el]2[NpV  +  (Np*)2  -  Hp*l) 

=  N-1p*E(w2|x*€l)  +  (p*  -  N_1p*  )[E(w|x*«l)]2 

because  1  -  q  -  p  . 

The  variance  of  t  is 

V(t)  .  E(t2)  -  [E(t)]2 

which  with  use  of  (6)  reduces  to  - 

V(t)  .  N-1(p*E(v2lx*€l)  -  p#2(E(w|x*€l)]2} 

*  N_1{p*V(w|x*cI)  +  p#q#l  E(wjx*  cl)]2) 


because  E(v2)  =  V(w)  +  lE(v))2  and  p*  -  p*  -  p#q*. 

This  result  might  lead  one  to  conclude  that  the  sampling  error  can 

be  reduced  by  making  p  small,  i.e.,  by  use  of  a  distortion  which  produces 

a  small  amount  of  data  within  the  Interval  of  importance  I.  However  such 

* 

a  conclusion  would  be  fallacious.  If  p  were  small,  the  values  of  w, 
given  x*el,  would  be  large.  The  increase  In  E(w|x%I)  and  V(w|x*cl)  would 


TM-505 

6-24-60 

-51- 


# 

greatly  outweigh  the  decrease  In  p  •  We  shall  not  pause  to  develop 

&  * 

this  point.  On  the  other  hand,  p  and  q  can  never  exceed  1.  Hence 
small  values  of  the  expected  value  and  variance  of  v,  given  x  el,  will 
produce  a  small  sampling  error.  We  recall  that  a  large  expected  number 
of  distorted  times  x*  within  I  must  imply  small  E(v|x*cl).  This  in 
turn  Implies  that  most  values  of  v,  given  x*«I,  should  be  small,  and 
hence  the  variance  of  v,  given  x*cl,  should  be  small. 


4 
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APPETfDIX  D 

Appendix  B  contains  a  mathematical  analysis  of  the  illustrative 
random  process  used  above.  The  time  between  failures  is  distributed 
with  probability  density 

(8)  X  exp(-  Xx). 

At  each  failure  death  occurs  with  probability  p,  but  repair  is  effected 
with  probability  q  =  1  -  p.  When  death  occurs  the  process  terminates, 
but  a  repair  is  followed  by  another  time  to  failure.  We  shall  compute 
the  distribution  of  times  to  death.  But  most  of  Appendix  B  is  taken  up 
with  computation  of  weights  and  sampling  error  under  importance  sampling. 
The  error  to  be  studied  is  the  sampling  error  in  a  sample  estimate  of  the 
probability  that  a  time  to  death  does  not  exceed  X,  where  X  is  any  real 
number.  An  Interesting  vulue  for  X  is  the  1$  quantile  of  the  times  to 
death. 

As  an  uuxiliury  formula  we  shall  derive  the  probability  density  of 
x*x+...  +  x^  where  the  x^  are  independently  distributed  with  proba¬ 
bility  density  (8).  We  shall  prove  that  this  probability  density  is 

(9)  P(x|n)  -  xV"VXx/(n-  1)J  . 

For  n  =  1  the  relation  is  obvious.  To  verify  the  general  case  by  in¬ 
duction  we  write 

x 

p(x|  n)  «  P(xi  +  ...  +  XQ)  «=  J  xn'V"VXt[l/(n  -  2)J]Xe’X(x"t)dt 

0 


which  reduces  to  (9). 
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Let  n  be  the  number  of  failures  prior  to  death,  and  let  x  be  the 
time  to  death.  The  probability  that  death  occurs  at  the  end  of  the 
n^*1  time  between  failures  Is 

(10)  P(n)  -  qn-1p. 

Hie  probability  density  of  x  given  n  Is  (9)*  Hence  the  probability 
density  of  x  Is 


which  reduces  to 

(11)  P(x)  ■  pX  exp( -  pXx). 

Thus  as  stated  in  Section  2  on  Monte  Carlo,  our  Illustrative  problem 
con  be  solved  analytically.  If  X  ■  1  and  p  -  the  Jjt  quantile  of  x, 
denoted  by  X,  is  obtained  from 

X 

J'i  exp(-  it)  dt  »  .01, 

0 

and  this  gives  X  ■  -  2  log  .99  -  .020  as  stated  above.  However,  we  have 
Introduced  a  Monte  Carlo  analysis  of  this  problem  because  of  the  possi¬ 
bility  of  the  following  mathematical  evaluation  of  the  Importance  sampling 


TM-505 

6-24-60 

-34- 


procedure.  Such  a  mathematical  evaluation  would  be  impossible  in  case 
of  a  complex  Monte  Carlo. 

We  assume  that  importance  sampling  is  employed  in  which  X  and  p' 

*  * 

are  distorted  to  X  and  p  .  If  a  distorted  history  leads  to  death  at 
the  time  of  the  n  failure,  and  if  x^,  i  -  1, ... .,n,  are  the  times  between 

failures,  the  weight  is  the  product 


X  exp(-Xxi) 

~  *  *  V.  T 

X  exp(-X  x  )  q 


X  exp(-Xxg)  q 

*  ,  *  *.  # 
X  exp( -X  x  )  q 


X  exp(-Xxn)  p 

#  .  #  *.  * 

X  exp( -X  x^)  p 


which  we  write 

(12) 

where 

#  *  * 
x  -  x^  +  ...  + 

is  the  (distorted)  time  to  death. 

Our  problem  is  to  estimate  the  probability  (in  undistorted  sampling) 
that  a  time  to  death  x  is  less  than  or  equal  to  X.  If  N  is  the  saoq>le 
size,  and  if  the  index  i  indicates  „the  i^1  sample  history,  our  estimator 
is 


N 

N"1  £  Cx(x*)w(x*,ni) 
i=l 
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where  C^(t)  ■  1  if  t  i  X  and  -  0  if  t  >  X.  This  means  that  ve  count 

* 

the  histories  for  which  x  £  X  and  divide  by  the  number  of  histories 

#  # 

in  the  sample,  the  history  being  counted  as  vfx^n^)  histories.  We 
shall  study  the  random  variable 


t 


CX< 


*#)v(x#,n) 


i 


because  the  variance  of  the  sampling  statistic  Is  N~*V(t),  where  V  de¬ 
notes  variance. 

For  the  expected,  value  of  t  we  have 


E(t) 


0  n»l 


C^(x*)w(x*,n)P(x*|n)P(n)dx*. 


The  factor  Cx(x  )  can  be  deleted  if  we  integrate  over  (0,X)  rather  than 
(0,m).  Hence  using  ( 9)t  (10),  and  (12)  ve  obtain 


E(t>  -/  y  (-%)  ^  •<x  'i>x  x  x  o* '  i 

0  £1  q  p 


p  dx 


■  1  -  exp(-  pXX). 

The  reader  can  check  using  (11)  that  E(t)  is  the  probability  that  x  <;  X. 

This  Justifies  in  this  particular  case  the  assumption  made  without  proof 
that  the  weighted  output  of  distorted  sampling  produces  an  unbiased  estimate. 


f 
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We  obtain  E(t2)  by  replacing  w  by  v2  in  the  initial  expression  for 
E(t).  Reductions  similar  to  those  above  give 


E(t2)  = 


J.  1  \- 


«  * 


2i2 


p  (q  X  -2q  XX  +qzX‘ 


j-  1  +  exp  X(X*  -  2X  +  $]}. 


The  variance  of  t  is  obtained  as  E(ta)  -  [E(t)]a,  and  Table  4  is  obtained 

from  the  specialization  of  V(t)  with  X  =  1  and  p  »  Table  4  is  this 

V(t)  divided  by  the  value  of  this  variance  in  undlstorted  sampling,  that 
*  « 

is,  when  X  =  X  and  p  *  p. 


! 
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TA^TE  1 


Weights?  related  to  the  use  of  p  (x)  ns  a  distortion  of  p(x) 


X 

p(x) 

P*(x) 

w(x)  =  p(x)/p*(x) 

.1 

.0099900 

. 90484 

.0110 

.2 

.0099800 

.81873 

.0122 

•  3 

.0099700 

.70082 

.0135 

.4 

.  009960a 

.67032 

.0149 

.5 

.0099501 

. ooo53 

.0164 

.6 

.0099402 

.54001 

.0181 

.7 

.0059502 

■DOB 

.0200 

.8 

.0099203 

.44933 

.0221 

.9 

.0099104 

.4.0:37 

.0244 

1.0 

.00555:“; 

.5-788 

.0269 

2.0 

.00580:0 

.13534 

.0724 

3-0 

.00570  4 

.049757 

.1949 

»*  .0 

■■ 

.015516 

.  5246 

9.0 

.00901 25 

.0067579 

1.412 

6.0 

.0054170 

•  00^-r 

3.799 

7.0 

♦  00' -5: 57 

.00091166 

.. 

10.22 

8.0 

.009  '  >11 

•  000;';  6^6 

..  _ 

27.52 

9.0 

•  OJ  ,*1 ' 

74.56 

.00f-04o4 


10.0 


0 r»0  ■■,^30 


■  l 99.3 


f 
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TABLE  2 

* 

Sample  from  p  (x)  as  a  distortion  of  p(x) 


Sample  x 

from  p*(x) 

p(x) 

P*(x) 

v(x)  «  p(x)/p*(x) 

2.71 

.009733 

.06654 

.146 

•  31 

.009969 

.7334 

.014 

.17 

.009983 

.8457 

.012 

.02 

.009998 

.9802 

.010 

.59 

.009941 

.5543 

.018 

.54 

.009946 

.5828 

.017 

4.15 

.009594 

.01576 

.609 

•  91 

.009909 

.4025 

.025 

2.72 

.009732 

.06588 

.148 

1.15 

.009886 

.3166 

.051 

Y  w(x)  =  .014  +  .012  h 
x<i 

«■  .096 

(10)"1  £v(x)  -  .0096 

K  .010  +  .018  +  .017  t  .025 

TM-505 

6-kh-Go 
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table  3 

Sample  from  p  (x)  as  a  distortion  of  p'(x) 
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TABI£  4 


Sample  error  in  distorted  sampling  divided  by  the  error  in  undiotort.d 
campling. 


* 

q 

* 

.5 

.3 

.1 

.01 

.005 

.vX>l 

.0005 

1 

1.00000 

.70904 

.55679 

.54068 

.64145 

.84770 

14.58096 

1087.46222 

5 

.19739 

.13826 

.10571 

.10018 

.10013 

.10518 

1.78017 

3.33124 

10 

.09908 

.06798 

.05075 

.04552 

.04653 

* 

.04762 

.06128 

.08494 

20 

.09055 

.03327 

.02370 

.02197 

.02100 

.02126 

.02470 

.02989 

40 

.02770 

.01694 

.01097 

.00988 

.00919 

.00923 

.01027 

.01143 

60 

.02167 

.01264 

.00763 

.00671 

.00610 

.00610 

.00668 

.00754 

80 

.02032 

.01167 

.00687 

.OO59Q 

.00540 

.00530 

.00580 

.00643 

100 

.02132 

.01239 

.ooiky 

.00652 

.00590 

.00507 

.00622 

.00695 

120 

.02409 

.01436 

.00896 

.00798 

.00729 

.00725 

.00758 

.00807 

140 

.02854 

.01734 

.01145 

.01032 

.00949 

.00949 

.00981 

.01030 

160 

.03492 

.02210 

.01499 

.01368 

.01278 

.01270 

.  .01303 

.01353 

lGo 

.04361 

.02833 

.'01932 

.01823 

.01718 

.01709 

.01743 

.01000 
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